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Abstract 


The prediction of anomalies or adverse events is a challenging task, and 
there are a variety of methods which can be used to address the problem. 
In this paper, we introduce a generic framework developed in MATLAB® 
called ACCEPT (Adverse Condition and Critical Event Prediction Tool- 
box). ACCEPT is an architectural framework designed to compare and 
contrast the performance of a variety of machine learning and early warn- 
ing algorithms, and tests the capability of these algorithms to robustly 
predict the onset of adverse events in any time-series data generating 
systems or processes. 
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Acronyms 


ACCEPT 

Adverse Condition and Critical Event Prediction Toolbox 

MSET 

Multivariate State Estimation Technique 

SVR 

Support Vector Regression 

ELM 

Extreme Learning Machine 

BNN 

Bagged Neural Network 

k-NN 

k-Nearest Neighbor 

NMSE 

Normalized Mean Squared Error 

RMS 

Root Mean Square 

KS 

Kolmogorov-Smirnov 

PCA 

Principal Components Analysis 

DBM 

Deep Boltzmann Machine 

DBN 

Deep Belief Network 

DNN 

Deep Neural Network 

CNN 

Convolutional Neural Network 

RNN 

Recurrent Neural Network 

LSTM 

Long Short Term Memory Network 
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Linear Dynamical System 

EM 
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Numerical Algorithms for Subspace State-Space System Identification 

AIC 

Akaike Information Criterion 

ROC 

Receiver Operating Characteristic 

AUC 

Area Under the ROC Curve 

SPRT 

Sequential Probability Ratio Test 


Nomenclature 


k 

Time index 

T 

Total number of time indices spanned by data 

d 

Prediction horizon 

L 

Critical threshold 

U k 

Feature vector 

U i 

Support vector 

P 

Number of features/parameters 

Zk 

Target parameter 

Vk 

Measurement Noise 

/(■) 

Support Vector Regression (SVR) nonlinear mapping function 
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Number of support vectors 

Cti, &i 

Support vector weighting coefficients 

P 

SVR bias offset parameter 

C 

SVR user-specified regularization parameter 

V 

SVR regularization parameter 

</>(•) 

SVR image transformation 

<v> 

Kernel function 

a 

Kernel width 

q 

SVR primal optimization variable 
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SVR slack variables 

SVR residual and Linear Dynamical System (LDS) output 

LDS output value prediction i time steps into the future 

Target parameter estimate 

Process/input noise 

State vector 

LDS system matrix 

LDS output matrix 

Process noise covariance matrix 

Measurement noise variance 

State estimate 

State size 

Euclidean norm 
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Universe of all possible events 
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Transpose 
Probability 
Expected Value 
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Gaussian distribution evaluated at x with mean ^ and covariance E 
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1 Introduction 

One of the objectives we planned to achieve in the process of developing 
a framework to test adverse event prediction algorithms was to allow for 
the prediction to occur within a reasonable time horizon of an actual 
adverse event, with low false positive and missed detection rates. Thus, 
we introduce a generic framework developed in MATLAB®, ACCEPT 
(Adverse Condition and Critical Event Prediction Toolbox), which is 
explicitly geared to providing a comparative performance assessment of 
results from the application of a variety of algorithmic methods. 

An initial step towards the goal of developing a new, state-of-the-art 
forecasting technology based upon this framework was presented in |1|. 
It is thus our aim in this paper to document continued steps towards 
achieving these goals, but with a broader focus on performance assess- 
ment. Here we use a variety of regression techniques in the context of a 
rigorous, formal analysis that involves cross validation. The regression 
techniques are used within an architectural framework to be described 
shortly, which act as a preprocessor for transformation of data into a 
form that is amenable to modeling for use in the context of applying 
hypothesis tests to the distribution formed by the resulting residual. 

The rest of this paper will be organized as follows. Sec. [2] details 
ACCEPT’s overall architectural framework. In Sec. [3] we review the ar- 
chitecture in the context of signal flow. In Sec. [4] we provide a discussion 
of optimization methods used for cross-validation and a detailed back- 
ground of all selected regression methods. In Sec.[5j we discuss optimiza- 
tion for all detection methods through validation for the best possible 
performance in generation of the results. We then offer concluding re- 
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marks in Sec. [6| Details on various supplementary toolbox requirements 
are provided in Appendices [A] and [B] Configuration and set-up of AC- 
CEPT are provided in Appendix |C| A developer’s guide is provided in 
Appendix [DJ Finally, demonstration of ACCEPT on a sample problem 
is provided in Appendix |E| 

2 The Architectural Framework for ACCEPT 

The architecture we use is patterned after MSET (Multivariate State 
Estimation Technique) , as documented in [ 2 1 . MSET represents the cur- 
rent state-of-the-art in prediction technologies and is used ubiquitously 
in nuclear applications, as well as aviation and space applications |3|. 
The architecture of ACCEPT has the same basic structure as MSET (as 
shown in Fig. [I]), but involves both a regression step and a detection step 
(shown in the dotted boxes, respectively). The regression step is typi- 
cally implemented with the aid of a machine learning technique and the 
detection step tests a set of fixed hypotheses relating to the statistical 
properties of the resulting residual. 

ACCEPT tests theoretical assumptions that differ from those made 
in MSET by appealing to the use of various detection and regression 
techniques used to parameterize the underlying models shown in the 
framework architecture (see Fig. [I]). MSET is restricted to the use of a 
set of proprietary nonlinear kernel-based regression operators that yield 
specific properties necessary for the statistical hypothesis tests governing 
early detection. As such, for MSET, detection or classification which 
appeals to hypothesis testing methods is not model-based, nor does it 
rely on the use of machine learning techniques. In contrast, our aim is 
to consider detection techniques less restrictive in their assumptions. 

As shown in Fig. [lj all data will be preprocessed and filtered using 
z-score normalization and feature selection, respectively. We employ to 
the idea of “boosting” in order to allow for the use of the residual gener- 
ated from the base model by the Kalman filter. In Fig. [lj the regression 
toolbox shown within the dotted box on the left contains many alterna- 
tive methods from which to generate the base model, which processes a 
select number of parameters (the “multivariate time series”) and maps 
them to a distinct target parameter. This static nonlinear mapping char- 
acterizes the basic relationship between the features or input variables 
and the target parameter or response variable. 

It is important to note that an unsupervised machine learning ap- 
proach is employed for this architecture, meaning that no labeled data 
is used to supervise the process of model learning. As such, all training 
data associated with the regression step is by definition nominal data. 
Anomalous data is reserved solely for validation and testing purposes, 
and does not influence the model characterized by the regression step 
described above. In this way, two distinct classes of machine learning 
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algorithms, regression and classification, are employed within ACCEPT. 
Classification methods based upon hypothesis tests are used to deter- 
mine if any novel, anomalous data is out of family with respect to the 
regression model characterizing the nominal training data. 

2.1 Step 1: Regression Toolbox 



Detection 

Toolbox 


Figure 1: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox) 


The residual output from the regression toolbox quantifies the dif- 
ference between the actual value of the target parameter and the value 
predicted by the base rnodeQ The regression techniques are applied 
in the context of a rigorous, formal analysis that involves cross valida- 
tion for an objective function that represents regression performance as 
quantified by the NMSE (Normalized Mean Squared Error) of resulting 
residuals. They also act as a preprocessor for transformation of data 
into a form that is amenable to modeling in the context of applying 
hypothesis tests to the distribution formed by the resulting residual. 

All detection methods will conform to a rigorous process that is based 
upon examples of the candidate adverse event contained in validation 
datasets. This is followed by alarm system design and final testing with 

lr rhis mapping should have a functional basis specific to the adverse event scenario 
defined within the context of a particular application, for which any reasonably robust 
regression approach can be used. 
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a hold out data set, which in theory should be drawn from the same 
distribution as the validation data. 

A single target parameter that acts as a global health indicator for a 
subject system or process rarely exists in reality. In fact, there may be 
multiple such indicators for each adverse event or anomalous operation 
that is a candidate for prediction. Thus, we may train as many regressors 
as there are available to characterize target parameters. It is equally 
important to ensure that the regressors are adequate predictors of the 
target parameters through a rigorous feature selection process that is 
based upon assessing the strength of correlations between these sets of 
parameters. 

2.2 Step 2: Detection Toolbox 

The dotted box on the right of Fig. [l] represents the detection portion of 
the architecture, which tests the residuals that are produced from Step 1. 
The residual may be distributed in such a way that is amenable to mod- 
eling as a Gaussian distribution. It can be used as the basis for learning a 
linear dynamical system, which is the first block in the detection portion 
of the architecture (labeled as “Kalman filter”), and needed for all but 
one of the prediction methods to be tested. The linear dynamical system 
implicitly characterizes the unmodeled dynamics unable to be captured 
by the regression method, which can subsequently be used for the design 
of an alarm system based upon different statistical hypotheses. 

The optimal alarm system tests the level-crossing prediction hypoth- 
esis, i.e. extreme values associated with the tails of the distribution 
formed by the residual. The predictive alarm system also relies on using 
the model provided by the linear dynamical system. The other alarm 
systems requiring the linear dynamical system belong to the class of de- 
tection methods that test shifts in the first and second moments of the 
distribution formed by the residual. An empirical variant of the only 
other alarm system (the “redline alarm system”) does not require the 
linear dynamical system, however, there is a model-based variant which 
does rely on it. Both variants are illustrated by two inputs into the block. 
Also, two equivalent variants are available for the optimal and predictive 
methods, although only one arrow is shown in the diagram for clarity. 


3 Signal Flow 

Fig . [T] is a functional representation of the architecture, meaning that the 
arrows in the diagram represent batch data transfer, rather than signal 
flow or real-time signal processing. This paradigm is used to emphasize 
the fact that learning takes place in two stages, due to the serial nature 
of the architecture. The architecture involves two distinct algorithms 
which both involve machine learning, and are necessary to fulfill the 
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desired objective of recasting the adverse event prediction problem into 
a form whose solution is accessible as a hypothesis-testing problem. 

Fig. [2] depicts a signal flow representation of the problem. The linear 
dynamical system shown in the dotted portion of the diagram can loosely 
be thought of as an alternate representation of the regression block. 
More explicitly, it is trained to mimic the statistical properties of the 
regression residual (the unmodeled dynamics), such that yk = %k — Zk, 
where yk represents both the residual and the output at time k, and Zk 
represents a target parameter that characterizes a specifically designated 
adverse event or anomaly. Note that the regression block takes as input 
a multivariate vector of features, u*., and outputs an estimate of this 
target parameter, Zk- Note also that the Kalman filter block takes as 
input the input or process noise, w and measurement noise, Vk- 



Prediction result 


Figure 2: Signal Flow Diagram 

The Kalman filter block accepts input from the regression block as yk, 
processes these residuals as observables, and computes state estimates, 
Xju fc , based upon these observations during the correction step. These 
state estimates can then be used to form forward projected residual value 
predictions, spanning the prediction horizon, Vi G {1 

These are used either by the predictive or optimal level-crossing alarm 
system to initiate an alarm to forewarn of a level-crossing associated 
with the threshold value L. The alarm design parameters for both the 
predictive and optimal level-crossing alarm systems are the threshold La 
and border probability Pj,, respectively. The redline alarm system can 
also be designed with the threshold La as an early warning for the more 
consequential level-crossing event associated with the threshold, L. Both 
alarm system design parameters are represented with dotted lines in Fig. 
[2} More detail on these detection methods will be provided in Sec. [5j 
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The Kalman filter block also generates an innovation signal, £&, which 
represents a secondary residual in the context of “boosting” mentioned 
previously. Due to the spectral properties of this innovation signal ( i.e . 
whiteness), it can also be used for testing various Sequential Probability 
Ratio Test (SPRT) hypotheses. This class of prediction methods tests 
shifts in the first and second moments of the distribution formed by the 
innovation signal, however it requires the residual to exhibit the specific 
properties of whiteness and Gaussianity. 

The SPRT tests, predictive, and optimal level-crossing event predic- 
tors, as well as the Kalman filter and Kalnab predictor all fundamentally 
use the LDS parameters, 9, which have been learned during the train- 
ing process. The only detection method shown in Fig. [2] not using the 
linear dynamical system parameters is an empirical variant of the red- 
line alarm system, which uses only the current residual value, y^, and 
relies on use of the alarm design threshold La, similar to the one used 
for the predictive alarm system. Note that there are two variants for 
the redline, predictive, and optimal alarm systems, and the distinctions 
between these two will be provided in further detail in Sec. [5] 


4 Regression Toolbox 

Recall that there are a variety of regression techniques to be tested in 
the context of a rigorous, formal analysis that uses nominal training data 
which is partitioned into multiple folds for the purposes of /-fold cross 
validation. As previously mentioned, the NMSE will act as the objective 
function used for cross validation. R is a traditional metric for assessing 
the goodness-of-fit of a regression algorithm via the residual and is often 
a better choice than using the RMS (root-mean square) due to the ability 
to compare regression algorithms more equitably. 

The ability to capture system dynamics is governed by the NMSE 
metric. For infinitesimally small NSME values, the spectral character- 
istics of the resulting residual should be white, and all of the dynam- 
ics embedded in the residual can be explained by the regression model. 
Thus, in a sense model fidelity characterizes the ability of the regression 
algorithm to learn all system dynamics (both linear and nonlinear). All 
prediction methods with the exception of the standard exceedance pre- 
diction method attempt to exploit the fact that the regression method 
will most likely not be able to explain all of the dynamics embedded in 
the residual. The residual itself will most likely contain linear dynamics 
that are driven by Gaussian noise. However it is still possible that the 
unexplained dynamics may contain nonlinearities which are unable to be 
captured, which is one limitation of assuming linear dynamics driven by 
Gaussian noise. 

The NMSE governs model fidelity, but it has also been empirically ob- 
served that it is also heavily correlated to a metric called the Kolmogorov- 


Smirnov (KS) statistic. The KS statistic is fundamentally a measure of 
Gaussianity of a distribution, by finding the maximum difference between 
an empirical cumulative distribution function based upon the residual 
and the normal distribution with mean and standard deviation given by 
the same residual data. This empirical observation will be studied in 
a more theoretically rigorous manner in a subsequent paper. Since the 
KS statistic governs a property assumed by almost all of the prediction 
methods to be tested, it is very convenient that it is very well correlated 
to the NMSE metric. 

Formally, the optimization used for regression can be posed as shown 
in Eqn. [I] with the NMSE metric as the objective. The optimization 
problem shown here is essentially the result of a /-fold cross validation. 
Note that the index i is used for an individual validation time-series 
data record, where the index k is the time index, as previously refer- 
enced. A standard grid search is used to optimize the NMSE over the 
hyperparameter specific to the regression method being evaluated. 


where 

/ = 

Ti = 

Zi,k — 

Z = 

Vi,k{ 

A = 

5 = 

A review of all the regression methods to be used are provided in the 
list below, and is associated with the highlighted block of the architecture 
previously shown as Fig. [TJ shown again below as Fig. |3j 

• Support vector regression (SVR, the technique adopted in |1| in- 
troduced by (4] ) 

• k- nearest neighbor regression (k-NN) 


minimize 

subject to 
X € S 


J = 


Ef=i ElLi vlkW 


Zi= 1 Zk=l %i, 


(i) 


number of folds used for cross-validation 
(number of records in validation dataset) 
number of time points in i th time-series data record 

%i,k (^) 


Zi,k - Z 

ZL EfcLi * 


i.k 


yf t 

2-a=l 

A) 

Regression-specific hyperparameter 
Regression-specific hyperparameter tuning domain 
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• ^2 regularized linear regression (LR) 

• Bagged neural nets (BNN) 

• Extreme Learning Machines (ELM) 

• RANdom SAmple Consensus (RANSAC) 


Continuous 
Training 
Data _ 


Multivariate Time series 


-+ Preprocessing and Feature Selection «- 


Training Data 


Static Regression 


SVR, k-NN, BNN 
Linear, ELM, 
RANSAC, etc 



All Data 


Continuous 
Validation & 
Test Data 


AR/ARMA 


DBM, DBN, 
RNN, CNN, 
DNN, LSTM, etc. 


Regi'ession 

Toolbox 


Learning 
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Filter 


Future 

Predicted 

Process 

Values 


Signal 


Output 

Process 


Level-Crossing 

Prediction 

Optimal Alarm 
System 


Predictive 
Alann System 

SPRT 

Flypotheses 

Redline Alarm 
System 


Detection 

Toolbox 


Figure 3 : Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), Regression Block Highlighted 


4.1 Support Vector Regression (SVR) 

In this subsection, we provide a brief description of the 77— support vector 
regression algorithm that is used for target prediction as documented 
in Q. Recall that MSET contains a proprietary set of kernel-based 
nonlinear operators designed to yield residuals with white noise spectral 
characteristics. However, support vector regression offers an improved 
ability to reduce target value prediction error compared to MSET. It 
also offers better built-in controls for complexity and numerical stability. 
This was found in a nuclear application studied in | 5 j. Given a finite set 
of multivariate observations, it is possible to reconstruct an input and 

target set that takes the form as shown in Eqn.J^J where U= [uo . . . uy’] T 
is an input data matrix of size (T x p) and the corresponding output is 

denoted by z = [zq . . . zt] T , termed as the target vector. Thus, there are 
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p parameters and T observations. Once the p — support vector regression 
algorithm is appropriately trained, it is possible to estimate a target 
function /( Ufc) that has at the most a deviation of p from the actual 
observed targets {zk} k=0 for all the input data {ufc E M p }^ =0 . 

z = /( U) (2) 

The target function /(•) is basically a linear combination of weighted 
similarities between some chosen training points and the test points with 
an additional offset which is often known as bias, p. The chosen train- 
ing instances with m non-zero weights are called support vectors (SVs, 
Uj) and they are the representatives of the model. This implies that, 
given the model, any training points apart from the SVs are of no im- 
portance and can be thrown out without changing the performance of 
the algorithm. The target function is shown in Eqn. [3j 

m 

f (ufc) = y 1 (o=t - q») (u», Ufc) + P (3) 

2—1 

The support vectors and their corresponding weights, ai and cb result 
from the solution of a quadratic programming optimization problem in 
dual form. The expression of the primal problem is shown in Eqn. [4j 
Further details on the cost function and optimization problem can be 
found in j4j . 


minimize P (q, C, ) = ^qq T + C ^ ) 

k = 0 

subject to (z k - q T c/>(u fc ) - p) < p + ^ 

(• - q T <Ku k) - pj >v + tk 

e fe + ,4">o 

C > 0 (4) 


C and p are user specified regularization and precision parameters 
respectively. They are chosen according to the practical guidelines set 
forth in |6|. £ + , are non-zero slack variables, q is the weight vector 
normal to the separating hyperplane, p is the offset parameter, <f>(u k ) 
represents the transformed image of E in the same Euclidean 
space, and k E [0, . . . ,T\. Throughout this research we have used the 
Radial Basis Function (RBF) as the mapping function given in Eqn. [5j 
where a represents the hyperparameter of the Gaussian function. 


(uj,u fc ) = exp 



Ufc - Uj 


(7 


2 



( 5 ) 
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The parameter a, also known as the “kernel width” parameter, con- 
trols the overall scale in horizontal variations and acts as the hyperpa- 
rameter associated with the optimization problem posed in Eqn. [l] The 
implementation of SVR in ACCEPT is based on the libsvm package [ 7 ] 
and the default settings for gaussian kernel parameter is used. The cost 
parameter C is considered the only hyper-parameter for SVR, and is 
optimized based on cross-validation. 

4.2 k-nearest neighbor regression (k-NN) 

As described in 1 8] , k-NN, or k-nearest neighbor regression is most useful 
for mapping data that is often represented in high-dimensional spaces to 
lower dimensional manifolds. As such, it is often used for contrast to 
linear dimensionality reduction techniques such as PCA (Principle Com- 
ponents Analysis). k-NN regression assumes that points in the data 
space that are located in close proximity to each other have similar out- 
put values. As such, for novel data presented to the regression algorithm, 
the output values must be located in close proximity to those k near- 
est points having similar patterns in higher dimensional space, and so 
the hyperparameter used for k-NN regression is the number of nearest 
neighbors. More details on this regression method can be found in |9|. 

4.3 I 2 regularized linear ridge regression (LR) 

One of the regression methods in ACCEPT’s regression toolbox is linear 
ridge regression. Tikhonov regularization is used, and the associated 
regularization coefficient is used as a hyperparameter to optimize the 
NMSE as a function of how well conditioned the solution should be. 
A well-posed or well-conditioned problem is one that yields a solution 
that meets the criteria of existence, uniqueness, and robustness (e.g. 
low sensitivity to natural variation in the data). The linear regression 
techniques to be evaluated are linear in the parameters to be estimated 
only, however basis functions for the regressors themselves to be studied 
will involve both affine and quadratic regressors in the same vein as was 
presented in 8], 1 10 1 , and (l 1 1 . Let u, represent a vector of regressors 
for observation i, then scalar elements of the vector can represent linear 
regressors, Ui , and/or quadratic regressors, uf. Note that the use of 
quadratic regressors include the uf regressors as well as the product of 
all ( {/) quadratic pairs of parameters, tq and Uj. 

4.4 Bagged neural nets (BNN) 

In this study we only consider traditional neural networks with a sin- 
gle layer perceptron as distinct from “deep learning” techniques which 
exploit the use of multiple hidden layer^] In this work, a single layer 

2 In future releases of our regression toolbox within ACCEPT, many variants of deep 
learning will be considered, including e.g. DBMs (Deep Boltzmann Machines), DBNs 
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perception refers to a network architecture that contains a single hid- 
den layer in addition to separate input and output layers, similar to the 
structure illustrated in the subsequent subsection (c/. Fig. [4]). Funda- 
mentally, neural networks offer the ability to capture nonlinearities in 
data by learning weights associated with nodes in a network that are 
linearly combined and ultimately transformed through a nonlinear map- 
ping. 

More details on neural networks and the “bagging” process which 
aims to aggregate ensembles of neural networks trained on datasets gen- 
erated from the same source can be found in other work 12 . “Bagging,” 
or bootstrap aggregation, is meant to address deficiencies in stability 
and accuracy of the neural net performance and also reduces variance 
to help prevent overfitting. However, due to the lack of complexity for 
this particular dataset, a single base model was used and no bagging was 
necessary. The hyperparameter used for neural nets (NN) regression is 
the number of hidden neurons in a single layer perceptron. 


4.5 Extreme Learning Machines (ELM) 

Extreme Learning Machines (ELMs) are emerging as an alternate learn- 
ing paradigm for multi-class classification and regression problems 13 
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and have outperformed some state of the art algorithms such as 
backpropagation neural nets, support vector machines, etc. ELMs have 
a basic architecture similar to that of a single layer feedforward neural 
network, but the input layer parameters are assigned randomly (sampled 
from any continuous random distribution). In such a setup, the output 
layer parameters are the only unknowns of the model, and can be ana- 
lytically determined using a linear least squares approach (See Fig. [4]), 
making ELM training extremely fast. Some of the attractive features of 
ELM include the universal approximation capability, better generaliza- 
tion, the convex optimization problem of ELM resulting in the smallest 
training error without getting trapped in local minima, and availability 
of a closed form solution eliminating iterative training |l3| . 

Consider the following data set 


{(xi,yi),...,(x N ,y N )} <E (X,y), 


( 6 ) 


where N denotes the number of training samples, X denotes the space 
of the input features and y denotes labels whose nature differentiate 
the learning problem in hand. For instance, if y takes integer values 
{1,2,3,..} then the problem is referred to as classification and if y takes 
real values, it becomes a regression problem. ELMs are well suited for 
solving both regression and classification problems and training is faster 
than state of the art algorithms 14 . ELM has been applied to several 


(Deep Belief Networks), DNNs (Deep Neural Networks), CNNs (Convolutional Neural 
Networks), RNNs (Recurrent Neural Networks - AR/ARMA-style), and LSTMs (Long 
Short Term Memory Network - AR/ARMA-style) 


13 
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Figure 4: Extreme learning machine model structure. 


benchmark problems 1 13 , 14 1 , system identification [15]|, diagnostics 16 


17 and control problems 
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with good success. When training data 
is available and a model is required to be learned using all the training 
data, a batch training approach is adopted which involves solving the 
following optimization problem 


mm ■ 

w 


\HW -Y\\ 2 + X\\W\\ 2 } 


4> = H t = iJj{Wjx(k) + b r ) £ M nh 


X 1 


( 7 ) 

( 8 ) 


where A represents the regularization coefficient, Y represents the vec- 
tor of outputs or targets, represents the hidden layer activation func- 
tion (sigmoidal, sinusoidal, radial basis etc [l4|) and W r £ W liXnh , W £ 
W lhXyd represents the input and output layer parameters respectively. 
Here, n* represents the dimension of inputs x(k), rih represents the num- 
ber of hidden neurons of the ELM model, H represents the hidden layer 
output matrix and yd represents the dimension of outputs Y. The matrix 
W r consists of randomly assigned elements that maps the input vector 
to a high dimensional feature space while b r £ W lh is a bias component 
assigned in a random manner similar to W r . The number of hidden neu- 
rons determines the expressive power of the transformed feature space. 
The elements can be assigned based on any continuous random distribu- 
tion |14| and remains fixed during the training process. Hence, training 
reduces to a single step calculation given by equation Q. The ELM 
decision rule can be expressed as in equation ( 10 ) for classification and 
equation © for regression. 


W* = [H t H + A/) -1 H t Y 


(9) 
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f(x) = sign {W T [^(Wj x + b r )]) . 
f(x) = W T [ip(Wr x + b r )} 


(10) 

( 11 ) 


Since training involves a linear least squares solution with a convex 
objective function, the solution obtained by ELM is extremely fast and is 
a global optimum for the chosen nh, W r and b r . The number of hidden 
neurons nh is considered the only hyper-parameter that is optimized 
based on cross-validation. The other parameters of ELM such as the 
regression coefficient A, random parameters W r and b r are fixed to default 
values. The user is free to modify these values to adapt to the data in 
ACCEPT. 


4.6 RANdom SAmple Consensus (RANSAC) 

Typical regression methods are based on assumptions which meet with 
success if they hold true. However, if these fundamental assumptions 
no longer hold, they can lead to misleading results as a consequence of 
the fragility of a particular regression method. RANSAC is a robust 
regression method that is not excessively affected when gross errors and 
outliers exist in the data [l9|. These outliers may be introduced as a 
result of incorrect measurements or excessive noise. These errors should 
be rejected since they do not fit the model and can cause improper 
parameter estimation, and is a technique that the RANSAC regression 
algorithm employs. The RANSAC algorithm is iterative and comprised 
of two main steps which are repeated until a sufficient model is generated 
19 . 


1. Create a Hypothesis 

RANSAC randomly selects N data points from a uniform distri- 
bution to instantiate a subset, Si, from the input data set U, de- 
fined as U = {ui, . . . ,ujv}- Elements of the the input vector 
are defined as u J k , and each individual element is distributed as 
u{ ~ U[0, 1]. Contrary to orthodox sampling methods, RANSAC 
uses the smallest subset of data (Minimal Sample Set (MSS)) pos- 
sible to construct a model M\ by calculating the model parameters 
using the subset S\ . For example, using a simple deterministic lin- 
ear model representation, u J k = ax\ + bx 2 + c, M\ = [a, b, c] T and 
N = 3 since three data points are required to uniquely determine 


the parameters of a linear model 20 


2. Evaluate the Hypothesis 

A point is considered an inlier if it agrees with the hypothetical 
model within an error threshold, 6. Using more data points than 
the MSS for this model would decrease the probability that the 
chosen points are all inliers. A new subset (consensus set), Sf, is 
produced by computing all the points that agree with the current 
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model, M\. RANSAC now assess the value of the model created 
on the complete data set using a cost function. RANSAC seeks to 
minimize the cost function j;20i| j 2l] : 


N 

C A (W;0) = 5>(ui,A(0)) (12) 

i = 1 

using 


where 


P( u,A(0)) 


0 |5a(u,0)|<5 

constant otherwise 


(13) 


A 

N 

U 

e 

h 


za{u,9) 


= the model space 
= number of data points 
= input data set {ui, . . . , ujv} 

= ({ui, • ■ • , u/,}), estimated parameter vector 

> the cardinality of the MSS 
= the error associated with a data point u 


If the cost is below a predefined value, 5, i.e., the current model 
comprises a sufficient number of inliers, then a new model M* is 
generated using the new subset, S*. If the consensus set has the 
lowest cost, C\(U',6), it is used as the leading candidate for this 
iteration of the algorithm. This process is repeated for a certain 
number of iterations, /. Given that ip is the probability that the 
algorithm will fail at least once, ui is the probability a point is an 
inlier and M is the number of points required to generate hypoth- 
esis 1 20 1 19 


log <P 

log(l — UJ M ) 


(14) 


After I iterations, the model with the lowest cost out of all leading 
candidates is selected as the final model. The hyperparameter used 
for RANSAC is the error threshold, 5. 


5 Detection Toolbox 

As distinct from regression, all detection methods will conform to a rig- 
orous validation process that involve both nominal training data and 
validation data containing examples of the adverse event in question. 
The detection methods can be split into the three types listed below. 
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1. Methods that use a Monte-Carlo style implementation to empiri- 
cally generate relevant alarm system statistics 

2. Methods that rely on a model-based approach to generate these 
same performance metrics 

3. A method that uses an optimal stopping rule based upon the test 
of hypotheses associated with abrupt changes in residuals of the 
modeled process output 


For the detection methods that fall into one of the first two cate- 
gories listed above, ROC curve analysis will be used to enable the de- 
sign of tradeoffs between false alarm and missed detection probabilities. 
A user-specified condition of acceptance based upon missed detection 
and/or false alarm rates will be established, and it will be used to de- 
sign an alarm system based on the training or validation ROC curve. 
The resulting threshold will be used to implement the detection method 
using the test data, illustrating all observed false positives and missed 
detections. These methods are based upon the use of a decision rule 
or hypothesis test which involves level-crossing behavior of the modeled 
process (see Sec. 5.3 for further detail). Strict definitions for the detec- 
tion performance metrics are provided in the list below. These definitions 
may vary slightly from standard definitions that are provided in the con- 
text of single time points alone. The definitions provided are based upon 
ground truth that spans a prediction horizon with multiple time points. 


False Alarm An alarm is triggered at a time point that does not con- 
tain an example of a confirmed anomalous event in at least one 
time point in the next d time steps, in a manner similar to the 


level-crossing definition provided in Sec. 5.3 


Missed Detection No alarm is triggered at a time point where an ex- 
ample of a confirmed anomalous event exists in at least one time 
point in the next d time steps, in a manner similar to the level- 


crossing definition provided in Sec. 5.3 


For the detection methods that involve level-crossing prediction of 
the modeled process, it is important to select the threshold associated 
with those level-crossings according to some quantifiable metric. For- 
mulating an optimization problem emerges as a natural way to address 
the need to “tune” the threshold associated with the level-crossing event 
to the adverse events found in the validation data. Furthermore, both 
the prediction horizon associated with the level-crossing event and the 
dimension of the state-space, where applicable, are parameterized as an 
implicit function of the objective function associated with this optimiza- 
tion problem. In that way, design of the alarm system can be based 
upon LDS model parameters derived from training data and the level- 
crossings that provide the best representation of violations. Alternately, 
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these level-crossings be identified in advance, and act as adverse events 
to be derived from validation data. 

A formal representation of the objective function is provided in Eqn. 
which is based upon the existence or absence of anomalous events, 
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and accompanying alarms associated with the level-crossing threshold 
L, both encoded as binary vectors for all records in the entire validation 
data set. 


Note that the objective function in Eqn. 15 involves the AUC and is 
optimized over a countably finite two dimensional grid n,d E Z + . The 
AUC is based upon using validation data as the ground truth and values 
of \yk\ as a source of candidate thresholds to yield L. It is also important 
to note that the AUC optimization procedure is only the first step in the 
threshold selection process, and involves only finding the state dimension 
and prediction horizon that yield the highest AUC value. Ultimately the 
goal is to select a threshold which yields the most accurate representation 
of the ground truth. Thus, the second step is to use the respective ROC 
curve as the basis of this selection process. There are many different 
criteria which can be used in the context of ROC curve analysis, however 
we employ the EER (equal error rate, where Pfa = P m d ) as the criterion 
for acceptance. 


maximize AUC 

subject to (15) 

n,d £ Z+ = [1,2, . . .] 

A summary of all the prediction methods to be discussed in this 
section are provided in the list below, and can be referenced in the high- 
lighted block of Fig. [5} 

• Standard exceedance 

• Redline alarm system 

• Predictive alarm system (Monte Carlo simulation) 

• Predictive alarm system (Numerical integration) 

• Optimal alarm system (Monte Carlo simulation) 

• Optimal alarm system (Numerical integration) 

• SPRT tests 


5.1 Standard Exceedance and Predictive Alarm System 

The objective of the simplest standard exceedance detection method 
is to obtain a threshold, La from alarm system design based upon a 
Monte-Carlo style implementation to empirically generate relevant alarm 
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Figure 5: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), Detection Block Highlighted 


system statistics (first from the preceding categorical list) to use for final 
testing. The standard exceedance method is also the only detection 
method not using LDS parameters, but shares a similar alarm design 
threshold, La , with the predictive alarm system that does require them. 
As such, the predictive alarm system requires the optimization problem 
posed in Eqn. 15 Both the standard exceedance and predictive detection 
methods also share a common alarm system design philosophy. Their 
design is based upon a Monte-Carlo simulation to empirically generate 
relevant alarm system statistics from realizations that use real validation 
in the attempt to define an envelope, [—La, La], outside of which an 
alarm will be triggered to forewarn of the impending level-crossing event. 
Formally, the predictive alarm system is given by {\yk+d\k\ > La}, such 
that it is based upon a predicted future process value, and the standard 
exceedance method is given by \yk\ > La- 


A “redline” alarm system can also be designed using the same alarm 
rule: \yk\ > La- However, the distinction between the redline alarm 
system and the standard exceedance method is that the former relies 
on sufficient statistics used to estimate the LDS parameters that come 
strictly from training data. In this case, an alarm design threshold, La, is 
used as a predictor of a second more critical threshold, L. Here, L is the 
same threshold tuned to the observed adverse event via the optimization 
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Figure 6: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), Redline and Predictive Block High- 
lighted 


problem shown in Eqn. 


E 3 


Similarly, the predictive alarm system can 


also be based on sufficient statistics used to estimate the LDS parameters 
that come strictly from training data. More detail on both these redline 
and predictive alarm system variants is provided in 1 22 1 . 


5.2 Linear Dynamical System 

With one exception all detection methods in the detection toolbox re- 
quire the use of the linear dynamical system. Linear dynamical systems 
evolve according to Eqns. 


\m- is 


demonstrating propagation of the 
state, Xfc E M n which is corrupted by process noise w*, E M n . For conve- 
nience of presentation, it will be assumed that propagation of all state 
covariance matrices can reasonably well be approximated by their steady- 
state counterparts. This approximation, while it introduces error with 
regards to the probability of a level-crossing event at a specific point in 
time, is ostensibly negligible and will provide for a great computational 
advantage in the design of an alarm system. Instead of designing an 
alarm system for each time step, a single alarm system can be designed 
for all time steps. The approximation is based upon the limiting statis- 


3 Note that the standard exceedance method uses La for tuning rather than L. For 
the redline alarm system, La is used solely for alarm system design. 
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Figure 7: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), Kalman Filter Block Highlighted 


tics that are reached at steady-state, which greatly reduces the compu- 
tational burden, as previously identified 22 . As such, the solution of 
the steady-state Lyapunov function, P xx , suffices for evolution of the 
unconditional state covariance matrix. The output, y k E M is univariate, 
and is corrupted by measurement noise dj. G 1. 


(16) 

(17) 

(18) 

where 

~ A7(0,Q), Q^O 
Vk ~ J\f(0,R), R > 0 


Xfc+i = Ax fc + 

Vk = Cx fc + v k 
Pxx = AP xx A T + Q 


Implementation of the optimal alarm system hinges on use of stan- 
dard Kalman filter and predictor equations which are omitted for the 
sake of brevity. However it is important to introduce relevant predicted 
future process output values, covariances and cross-covariances, given 
below as Eqns. 19- 21, respectively. These equations rely on Kalman 


filter formalisms, and will be used in subsequent formulae. P xx is the 
solution to the discrete algebraic Riccati equation (Eqn. 22), and P xx is 
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the steady-state a posteriori covariance matrix given in Eqn. 23 which 
both rely on the Kalman gain defined in Eqn. [24} The approximations 
shown in Eqns. 20 and [2l] will provide for a great computational ad- 


vantage in design of the optimal alarm system and its corresponding 
approximations for reasons stated previously. The assumption of sta- 
tionary is also required for the design of an optimal alarm system using 
this modeling paradigm (cf. Theorem from 1 22] ) , and holds here as well. 


Vk+j\k 

(19) 

P k+j]k « A^(P**-P xx )(A T y+P xx 

(20) 

Pk+i,k +j \k « A J (P X x — P XX )(A T )* + A J_ *1 

P xx (21) 

Pxx = A (P xx — FxxCP xx ) A T + Q 

(22) 

Pxx = Pxx - F XX CP XX 

(23) 

F xx = P XX C T (CP XX C t + R)~ 1 

(24) 

The parameters to be learned are specified in Eqn. 25 

as the pararn- 

eter 0. 


Q = (a*x j Po j -A- > C , Q , R) 

(25) 

where 


P k = Mx)(^/c Mx) ] 



There are three important considerations in learning 0. The first 
consideration relates to the method used for learning these parameters. 
One such data-driven approach incorporates the use of the Expectation- 
Maximization (EM) algorithm, which is an iterative maximum likelihood 
estimation-based approach that is ultimately an alternating nonlinear 
optimization problem. As such, it is possible to arrive at a solution 
which is only a local optimum, and there may be better solutions based 
upon the location of the initial parameters. 


In previous work 23 , the EM algorithm was used to learn the model 
parameters using a variety of initialization techniques. Furthermore, the 
fidelity of resulting models was assessed via a derivative of the Akaike 
Information Criteria (AIC), such as one presented by Bengtsson and Ca- 
vanaugh 24 , in addition to being used for model order selection^] In 


this paper we consider alternative approaches to initialization of the EM 
algorithm, based in part by a method known as Numerical Algorithms 
for Subspace State-Space System Identification (N4SID), documented in 
Overschee and De Moor |26f|~28] and Favoreel et al. |29j. However, us- 
ing this approach for initialization does not guarantee a globally optimal 


4 Note that here, the model order selection problem is handled entirely by the 


detection algorithm optimization problem cited in 25 . 
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solution. It is often suggested to use “random restarts” as a strategy 
for overcoming this limitation to find a near global optimum. Further- 
more, any of the methods listed as candidates for global optimization 
would work just as well to overcome this limitation. 


m 
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These will 

be studied in earnest in future papers. N4SID will be the approach used 
for initialization here, as it is often advocated to avoid undesirable local 
30 1, even though there is no guarantee of achieving a globally 


minima 
optimal solution. 

The second consideration relates to the enforcement of constraints 
for certain parameters in 9. For any useful linear dynamical system 
model, an important constraint on system stability is required, which is 
not guaranteed by any of the methods described in this section. Thus, 
the M-step of the EM algorithm must include a constraint such that the 
eigenvalues of A fall within the open unit disk, which can also be repre- 
sented as a constraint on the spectral radius, such that p(A) < 1. Recent 
work by Boots [3 1] will allow for enforcement of stability throughout the 
entire identification and learning procedure in the context of the EM 
algorithm, as originally suggested by Siddiqi et al. 32 . The constraint 
is enforced by the use of cvx [33] , 34 , a package for specifying and solv- 


ing convex programs, in the context of solving a quadratic optimization 
problem as posed in 1 31 1 . In some cases the data used for initialization 
of the EM algorithm may also yield an unstable linear dynamical sys- 
tem model such that p( A) < 1. As such, using N4SID for initialization 
will fail, and as an alternative the method by Siddiqi et al. [32] will be 
implemented. 

In a practical application on learning linear dynamical system pa- 
rameters by Derek et al. 
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it was found that numerical instabilities 
cause convergence problems when obtaining M-step estimates for all co- 
variance matrices 9 CV = (Pq,Q,R). This can in part be assisted by 


enforcing symmetry with the approximation X 


x+x 


However, in 


addition to enforcement of symmetry and stability constraints, numeri- 
cal issues can also be addressed by enforcing constraints on the positive 
definiteness of these matrices ( 9 CV ), which is also a requirement for im- 
portant control theoretic properties to hold true. This is easily achieved 
by using the following generic convex optimization formulation in Eqn. 


26 below, also using cvx. The constraint is enforced only at the final step 
of the EM algorithm, following the recommendation of 31 , which cited 


no observed additional increase in model fidelity or accuracy by perform- 
ing this often computationally intensive computation at each iteration 
of the EM algorithm. 


minimize tr(SX) — log det(X) (26) 

X ^ O 


where for Pq 
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X] y fc*5r 

k = 0 
T 

Y. P fc|T + Xfc|TX^| T 


fc =0 


A 


T - 1 


^ P fc+l,fc|T + Xfc+ilTX^T 


k=0 


r xx — r xx + r x ^(p T | T + x T | T xJ| T )r xx 


where 


x A: | T = E[x k \y Q ,...,y T ] 

p A:|T = -S[(xfc - hx)(x fc - /r x ) T |y 0 , • • • ,yr] 


The third consideration relates to scalability and computational com- 
plexity of the E-step within the EM algorithm. Machine learning tech- 
niques used for adverse event prediction should be highly scalable and 
capable of supporting a fleetwide analysis for future deployment. This 
is important when considering the vast quantities of fleet data that can 
be used to train a linear dynamical system model. However, it is equally 
important when training a model based upon a smaller representative 
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sample from the fleet. Both sample sizes are used in the context of the 
optimization problem for detection algorithms that rely on this mathe- 
matical construct. 

The optimization requires the pre-computation of LDS parameters 
with as many different model orders, n, d £ Z + = [1, 2, . . .] as are practi- 
cal to learn. Practically, constraints are often applied such that n < 10. 
The optimization problem (c/ |25|) can then be performed without hav- 
ing to learn LDS parameters redundantly, since many iterations are of- 
ten required to arrive at a near global optimum as a function of model 
order in addition to other relevant tuning parameters, as previously sug- 
gested^] However, finding LDS parameters for model orders ranging from 
n £ [2, . . . , 10] is still a computationally burdensome task, and as such 
complexity is important here as well. 

Martens [36] developed a novel algorithm to specifically address this 
issue, by making approximations based upon rigorous statistical and con- 
trol theoretic principles. These approximations are enabled in part by a 
set of recursions established for the aggregate sufficient statistics that are 
used as part of the M-step in the EM algorithm. Per iteration compu- 
tational complexity is consequently reduced from <D(n?T ) to 0(n 3 ku m ), 
where ki irn <C T and kn m is a user selected value. This provides an 
extremely valuable tool in surmounting an otherwise intractable com- 
putational bottleneck for very large datasets, and where applicable the 
approach can be used for this study. 

5.3 Optimal Level-Crossing Prediction 

As with the redline and predictive methods, the optimal alarm system 
also requires parametric tuning of n, L, and d using Eqn. [l5]to find the 
best final outcome. The ultimate goal of these optimizations is to close 
the gap between the level-crossing prediction hypothesis being tested and 
prediction of the actual adverse event. Details on alarm system design 
have been given in [22], and are based upon appropriate threshold selec- 
tion of Pb, given pre-established user-specified conditions of acceptance 
based upon missed detection and/or false alarm rates, and the resulting 
ROC curve. Just as with the predictive alarm system, the optimal alarm 
system can either use a Monte-Carlo style implementation to empirically 
generate relevant alarm system statistics or use the estimated LDS pa- 
rameters that come strictly from training data as a function of sufficient 
statistics to compute those alarm system statistics numerically without 
relying on Monte Carlo simulations. 

Recall that the adverse event detection problem can be cast as an 
optimal level-crossing prediction problem. In this section, we provide an 
overview of some of the fundamental theoretical and implementation de- 
tails of this method. A level-crossing event, C*,, is defined with a critical 

5 See footnote |4] 
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Figure 8: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), Optimal Alarm System Block High- 
lighted 


level, L , that is assumed to have a fixed, static value. The level defines an 
envelope outside of which a critical parameter, y may experience several 
excursions. This parameter can be represented by a dynamic process, 
and is modeled as a zero-mean stationary linear dynamic system driven 
by Gaussian noise. The theoretical underpinnings of this approach are 
based upon this standard representation of the optimal level-crossing 
problem. 

The essence of the optimal alarm system is derived from the use of the 
likelihood ratio resulting in the conditional inequality: P(Ck\yo , . . . , yk) > 
Pb- This basically says “give alarm when the conditional probability of 
the event, C*,, exceeds the level Pb” Here, Pb represents some optimally 
chosen border or threshold probability with respect to a relevant alarm 
system metric. Pb is an extremely important parameter, as it effectively 
defines the space spanned by the alarm region and therefore controls the 
tradeoff between false alarms and missed detections. It is thus chosen 
with respect to these metrics and is the key parameter used for design 
of an optimal alarm system. 

It is necessary to find the alarm regions in order to design the alarm 
system. The event, C&, can be chosen arbitrarily, and is usually defined 
with respect to a prediction window, d, as well as the critical threshold, 
L. In this paper, the event of interest is shown in Eqn. 27 and repre- 
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sents at least one exceedance outside of the threshold envelope specified 
by [— L, L\ of the process y k within the specified look-ahead prediction 
window, d. Mathematically, it can also be represented as the union of 
disjoint subevents, (J^ =1 Sk+j, or as the union of overlapping subevents, 

U?=1 K +j - 

E k +j represents an exceedance at the j th time step, and S k +j is a se- 
quence of such subevents, of which only the j th represents an exceedance. 


where 


d 



C k = \JS k+j 

(27) 


3= 1 



d 



= IK+i 

(28) 


3 = 1 



d 



= P \ P] E k +j 

(29) 


3 = 1 


-S^k+j 

= ilVk+jl < L}, Vj > 1 


Sk+j 

A f E' k+j 3 = 1 

1 iTi -Ek+iiK+j 



Previous work 22 


provides the mathematical underpinnings for the 
optimal alarm condition corresponding to the level-crossing event, shown 
here as Eqn. [30j Alternatively, the optimal alarm condition derived 



can be expressed in terms of the subevents Ek+j, as shown in 


P(Ck\yo, ■ ■ ■ ,Vk) > Pb 

d 

P{ P| E k+j \y 0 , ...,y k ) < 1 - P b 
3= 1 


(30) 

(31) 


As was discussed in 22 , it is not possible to obtain a closed-form rep- 
resentation of the parametrization for the optimal alarm region resulting 
from Eqn. [30| Monte Carlo simulation is thus often used to generate 
alarm system design statistics to be estimated empirically. These es- 
timates are generated by using validation data consisting of example 
adverse events and the conditional probability expressed in Eqn. 30 


However, alarm system design statistics can also be obtained by com- 
puting relevant multivariate normal probabilities which approximate the 
optimal alarm region, evaluated by numerically integrating expressions 
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as provided in 22 1 . Although there were two such approximations stud- 
ied in 1 22], using the one requiring the least computational effort was 
shown not to lose any appreciable accuracy as compared to the other 
approximation. We refer the interested reader to 0 , 1 22], or 1 37] for 
further detail. 


5.4 SPRT Hypothesis Tests 



Detection 

Toolbox 


Figure 9: Functional Architecture, ACCEPT (Adverse Condition and 
Critical Event Prediction Toolbox), SPRT Block Highlighted 


As mentioned in Sec.[lJ SPRT tests are a central part of MSET, and 
have met with quite favorable results. Here we provide more detail on 
SPRT, but full details on SPRT can be found in [38] . The SPRT test 
statistic is a cumulative log-likelihood ratio between the probability dis- 
tributions characterizing anomalous and nominal behavior. The SPRT 
test represents the last item in the preceding categorical list, represent- 
ing the method using an optimal stopping rule based upon the test of 
hypotheses associated with abrupt changes in residuals of the modeled 
process output. 

There are four distinct alternative hypotheses shown as Eqns. 
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37, which are tested using the SPRT statistics that are computed with 
MSET to provide comprehensive coverage. These statistics test the hy- 
potheses for both positive and negative mean drifts, as well as nominal 
and inverse variance shifts. The denominator represents the null hypoth- 
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esis, T~Lo shown as Eqn. 33 and characterizes nominal behavior, while the 
numerator represents the alternative hypothesis, T-Lj and characterizes 
anomalous behavior. 


si 

- S j -fVlog^ 1 ^ jell 4} 

(32) 

p(£i\Po) 

= ^(£i;0,CP**C T + i2) 

(33) 

p{£i\Pi) 

= Mpos, CPxxC T + R) 

(34) 

P{£i\P2) 

= M{Ei\ — M ne g, CPxxC^ + R) 

(35) 

P(e*l^3) 

= AA( £i ;0,K O m(CPx X C T + R)) 

(36) 

p{£i\Hi) 

= v(e,0, CP *f T + fl ) 

\ * inv J 

(37) 

Picx 

G R" xn (solution to DARE) 

(38) 


A theoretically optimal stopping rule, Si > log 1 p Pmd provides the 
best threshold, given user-specified tolerances as established targets for 
both missed detection and false alarm probabilities (as P m d, Pf a respec- 
tively). This class of prediction methods tests shifts in the first and 
second moments of the distribution formed by the innovation signal. 
Positive or negative shifts in the first moment or mean are given by 
parameter M pos and M neg , respectively, and increases or decreases in 
the second moment or variance are given by parameters V nom and V) n „ , 
respectively. 

As with the other detection methods, alarm systems based upon 
testing the SPRT hypotheses will require parametric tuning given by 
the modified objective function in Eqn. [39} 


Eh ElU 0(i,h) A l4=i s i> 1 og 

minimize —7 

E Up 

subject to 
seR 4 

n£ Z+ = [1,2,...] 

where 


(39) 


k) 


s 


A 


Ground truth classification for the k th 
time point in the i th validation record 


Mj 

M, 

V r 


pos 

neg 


nom 

Vin.v 


Note here that the optimization function is now based upon a mod- 
ified Hamming distance metric and expressed as a function of n, M pos , 
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M-negi V r 


in Eqn. 15 


nom, Vinv as distinct from AU C expressed as a function of n and d 
Any number of optimization techniques can be used to solve 


this problem, however in an attempt to find a near global optimum, a 
number of different approaches have been tested, which are provided in 
the list below. 


• Simulated Annealing 

• Genetic Algorithm 

• Multi Start - choosing multiple local initial starting points uni- 
formly, followed by running local optimizations from each of the 
points 

• Global Search - use a serial scatter-search method to find initial 
starting points from which to run local optimizations, followed by 
elimination of the unlikely candidates 

• Pattern Search - gradient-free pattern search via direct objective 
function evaluation 


In order to allow for a fair comparison of all prediction methods, 
we will implement a prewhitening filter for the SPRT-based hypothe- 
ses. This is required in order to meet the assumptions implicit in using 
the SPRT, which is that the residuals are white {i.e. not serially corre- 
lated), and Gaussian. The Gaussian nature of the residuals has already 
been accounted for in part, based upon the discussion of the relation- 
ship between the KS statistic and optimization of the NMSE metric that 
takes place by applying the /-fold cross validation procedure discussed 
in Sec. [TJ Since we are not using the same state estimation procedure as 
MSET, which are specifically designed to yield residuals that are white 
as discussed in |2j, this prewhitening step is compulsory. Conveniently, 
prewhitening occurs naturally via the Kalman filter residual. Thus, the 
associated filter parameters are based upon the same parameters sub- 
sequently used for testing the level-crossing prediction hypothesis. This 
lends itself nicely to a well balanced basis for comparison. 

Appropriately tuned parameter values can be selected by solving the 


for all relevant detection methods, given the appropriate set of optimiza- 
tion arguments. After this step, the corresponding optimized parameter 
valued can be used for final testing and implementation. This step 
is seeded by a disjoint hold out dataset which was not used for either 
training or validation, and presumably contains real examples of adverse 
events that are similar in nature to those used for validation, or more 
formally, derived from the same distribution and data-generating pro- 
cess. The same testing regimen is applied for all selected regression and 

6 and where applicable, alarm design thresholds 


constrained optimization problems posed in either Eqn. [15] or Eqn. 39 
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prediction methods for comparison. From this we can ascertain the best 
combination of regression and detection algorithms to use as dictated by 
the given performance metrics. 

Table 3: Tunable Regression Hyper-paranreters 


Regression 

Method 

Regression Description 

Hyper-parameter 

Hyper-parameter 

description 

SVR 

Support Vector Regression 

cr or C 

Kernel Width or cost 
parameter 

k-NN 

k-NN Regression 

k 

Number of nearest 
neighbors 

LR1 

Ridge (linear) regression using 
linear regressors 

A 

i 2 regularization 
coefficient 

LR2 

Ridge (linear) regression using 
quadratic regressors 

A 

i 2 regularization 
coefficient 

ELM 

Extreme Learning Machines 

n h 

number of hidden 
neurons 

BNN 

Bagged Neural Networks 

”-h 

number of hidden 
neurons 

RANSAC 

RANdom SAmple Consensus 

8 

cost threshold 


6 Conclusions 

We have introduced a new architectural framework that can be run in 
MATLAB® called ACCEPT (Adverse Condition and Critical Event Pre- 
diction Toolbox). An open source release of this package will be made 
publicly available before the summer of 201^j The fundamental purpose 
of releasing this software package is to provide a tool which can be used 
specifically for the prediction or forecasting of adverse events in time 
series data. It offers a single, unifying framework in which to compare a 
variety of combinations of algorithmic approaches, and provides a plat- 
form to act as a catalyst in advancing the state of the art in technologies 
related to this problem. 

7 Source code will be posted at the DASHlink URL: 
https : / /c3 .nasa.gov/dashlink/projects/ 10/resources/ 
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In future work we will also extend the scope of the software to incor- 
porate alternate classes of regression methods ( e.g . streaming, AR/ARMA- 
based, etc.) in the context of the same architectural framework. As such, 
the framework has been designed with flexibility and modular extensi- 
bility to accommodate future innovations. Appendix [D] provides a de- 
veloper’s guide which will provide a way forward towards that end. We 
will also augment the architectural framework to more seamlessly test a 
variety of explanatory feature sets, which will aid in event localization 
and isolation efforts. 
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Appendix A 

MATLAB Toolboxes Required for Full 
Functionality 


• Statistics Toolbox (MATLAB) 

— Basic requirement 

• Optimization Toolbox (MATLAB) 

— Basic requirement 

• Parallel Computing Toolbox (MATLAB) 

— Required for running distributed jobs on cluster (if available) 
to reduce processing time 

• MATLAB Compiler (MATLAB) 

— Required for running distributed jobs on cluster (if available) 
to reduce processing time, if Parallel Computing Toolbox is 
not available 
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• Global Optimization Toolbox (MATLAB) 

— Option to run global optimization routines in support of re- 
gression or SPRT detection optimization 

• Control Systems Toolbox (MATLAB) 

— Required for any detection method using an LDS 

• Neural Network Toolbox (MATLAB) 

— Required for testing BNN (Bagged neural networks) as one 
the regression methods 

Appendix B 

Auxiliary Toolboxes Required for Full 
Functionality 


Please note that all of these package folders must be installed according 
to these directions: 

• Third Party Toolboxes 

— KPMstats and KPMtools Toolboxes (Kevin Murphy, MIT/GPL: 
http : //code . google . com/p/bnt/source/browse/trunk/?r= 

36) 

* Basic requirement, place anywhere on your Matlab path 
— SMO code for SVR routine 

* Place the files listed below under 
$ACCEPT_DIR/regressopt/svropt 

* The following files are required for running SVR as one 
of the regression methods 

■ gsmo . m (and also equivalent mex-compiled version for 
your OS, e.g. gsmo_mex .mexa64, link: 
https : //searchcode . com/codesearch/view/29473653/) 
• kernel. m (and also equivalent mex-compiled version 
for your OS, e.g. kernel .mexa64, link: http://cmp. 
f elk. cvut . cz/ cmp/sof tware/ stprtool/) 

— LibSVRcode for SVR routine (Chih-Chung Chang and Chih- 
Jen Lin, https://github.com/cjlinl/libsvm) 

* Install in $ACCEPT_DIR/regressopt/libsvropt 
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— RANSAC code for regression (Marco Zulianis RANSAC tool- 
box, released under GNU LGPL at https://github.com/ 
RANSAC/RANSAC-Toolbox. and Laurens van der Maatens Mat- 
lab Toolbox for Dimensionality Reduction released under the 
FreeBSD License at http: //lvdmaaten. github . io/drtoolbox/) 

* Install RANSAC toolbox in $ACCEPT_DIR/regressopt/ransac 

* In the Matlab Toolbox for Dimensionality Reduction, lo- 
cate /drtoolbox/techniques/pca.m, place in 
$ACCEPT_DIR/regressopt/ransac, and rename it RANSACpca.m. 

— Kalman Filter Toolbox (Kevin Murphy, MIT/GPL http:// 
code . google . com/p/bnt/source/browse/trunk/?r=36), re- 
quired for any detection method using an LDS, the following 
files need to be placed under $ACCEPT_DIR/ldslearn 

* smooth.update .m 

* kalmanrupdate .m 

* kalman.f ilter .m 

* kalman.smoother .m 

— ASOS Toolbox (James Martens, Apache 2.0, obtained here: 

http : //www. cs .tor onto . edu/~ jmartens/ASOS/) 

* Option for any detection method using an LDS 

* Should be installed under $ACCEPT_DIR/ASOS 

— CVX toolbox (Michael Grant & Stephen Boyd, GNU General 
Public License 2.0, obtained here: http://cvxr.com/cvx/) 

* Option for any detection method using an LDS 

* Should be installed under $ACCEPT_DIR/cvx, using cvx 
installation directions 

* Follow directions in Readme. docx for additional installa- 
tion instructions 

• Toolboxes developed in-house at NASA Ames Research Center 

— Regression Wrapper Code (released w/ACCEPT) 

* Basic requirement 

- SMO code for SVR routine 

* svmregSMO.m is released w/ACCEPT under 
$ACCEPT_DIR/regressopt/svropt (required for running 
SVR as one of the regression methods) 

— Kalman filter augmentation ARC 17528-1 (Rodney Martin, 
not released w/ACCEPT and can be accessed here: https : // 
c3.nasa.gov/dashlink/members/ 10/re sour ces/?type=al) 
(required for any detection method using an LDS, the follow- 
ing files need to be placed under $ACCEPT_DIR/ldslearn) 
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* learnJkalman . m (use modified version released separately) 

* em.converged . m (use modified version released separately) 

— CAMCOS LDS Initialization script (SJSU released to NASA 
w/ ACCEPT) 

* Option for any detection method using an LDS 

— Stanford cvx utility calls (Eric Chu, released to NASA under 
NRA NNX07AE11A) 

* Option for positive definite matrix constraint enforcement 

— ROC Curve Augmentation ARC 17529-1 (Rodney Martin, not 
released w/ACCEPT and can be accessed here: https : //c3 . 
nasa. gov/dashl ink/ members/ 10/resour ces/?type=al) 

* Required for all detection methods except for SPRT 

* Install in $ACCEPT_DIR/detectopt 

— Optimal Alarm Toolbox (Rodney Martin, NOS A, released w/ 
ACCEPT, but can be accessed here: 

https : //c3 .nasa.gov/dashlink/resources/119/) 

* Required for running Optimal Alarm System as one of 
the detection methods 

* If not already part of package, extract under 

$ACCEPT_DIR/ optalarm.osrelease 

— MCR Scheduler toolbox to set up distributed computing w/ 
Matlab compiler (released w/ACCEPT). Option for distributed 
computing only if a computing cluster is available. Set the 
following fields of the sched struct in create JobsNew.m ac- 
cordingly 

* sched. Type=‘ ’ ; 

* sched. JobFolder=‘ ’ ; 

* sched . MCRRoot= ‘ . . /MATLAB_Compiler_Runtime/v7 13/ ’ ; 

* sched. ServerName=‘ ’ ; 

* sched . Submit Argument s= 

‘-1 walltime=168 : 00 : 00 -1 nodes=l :ppn=l ’ ; 

* sched. ResourceTemplate=‘-l nodes=N’ ; 

• Uncategorized Toolboxes 

— LDS stability constraint toolbox (Siddiqi et al. 1 32] , no formal 
license other than a requirement to acknowledge researchers 
by citation of corresponding paper, obtain here: http : //www . 
select . cs . emu . edu/projects/stableLDS/) (required for any 
detection method using an LDS, extract package into existing 
folder $ACCEPT_DIR/stablelds 

* Keep learnCGModelEM .m ! (modified version which should 
already be included in $ACCEPT_DIR/stablelds) 
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* Keep learnCGModelSS .m ! (modified version which should 
already be included in $ACCEPT_DIR/stablelds) 

— N4SID Toolbox (Overschee Sz DeMoor, International, no for- 
mal license, obtained here: http : //homes . esat . kuleuven. 
be/ ~smc/sy sid/software/) 

* Required for any detection method using an LDS 

* Install in $ACCEPT_DIR/ subfun 

* Use existing modified version of subid .m included in AC- 
CEPT package 

Appendix C 

ACCEPT set-up and configuration guide 


Dependent open sourced packages are not included with ACCEPT. 
Instructions for installing the relevant packages and files for all auxiliary 
packages can be found in Appendix |B| 

1. ACCEPT set-up 

(a) Environment variables to set 

i. DATA_DIR = local path containing data repository 

ii. ACCEPT_DIR = local path where you’ve checked out the 
ACCEPT package 

(b) Add entire repository to matlab path (exercise caution with 
the cvx branch: you should only add the root and / functions 
folders) 

(c) Follow directions for installation of auxiliary packages in Ap- 
pendix [Bj 

2. Customized functions (auxiliary files to edit and create). In the hie 
User_input_accept_generic .m you will find the following lines - 
modify as follows and save as User_input_accept .m (without the 
“-generic” suffix): 

(a) params . acceptpath=getenv( ‘ ACCEPT-DIR’ ) ; % No modifi- 
cation necessary 

(b) params . datapath= [getenv( ‘DATA-DIR’ ) ‘/Data’]; % En- 
ter location (path) of data corresponding to adverse events 
and their nominal data counterparts 

i. Partition these datafiles into /Training, /Validation, 
and /Testing sub-directory structures. 

ii. There must be as many training hies as there are valida- 
tion hies. 
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iii. These datafiles should be in *.mat format. 

iv. The * . mat file must contain a struct with the following 
fields at the very minimum: data, header. 

v. The data field must represent a matrix of numerical ele- 
ments, with columns corresponding to the header fields, 
and rows corresponding to the time index. 

vi. The header field must be a cell array containing the names 
of all parameters. 

(c) params . anomalytype= ‘Adverse Event 1’,' Adverse Event 
2’ , ‘Adverse Event 3’ ; % Contains the names of the ad- 
verse events to predict 

(d) params . loadf cn={ ‘Function_l ’ , ‘Function_2 ) , 
‘Function_3’} ; % Contains names of the loading functions 
for data corresponding to a particular scenario ( e.g . operating 
regime) 

i. These functions partition the datasets provided at the lo- 
cation^) provided in params . datapath according to the 
corresponding scenario. 

ii. Example call: 

[header , data, message] = 

Function.l ( ‘nomdataf ile .mat ’ ) ; 

iii. The function must take in as its sole argument the file- 
name of the *.mat file (e.g. nomdataf ile .mat) 

iv. The function must return as output arguments header, 
data, and message. The first two outputs are self- 
explanatory, and the last argument can be used to report 
errors. 

v. The body of the function must include the logic necessary 
to temporally partition out the specific section of the data 
hie that corresponds to the scenario of interest. 

vi. Please note that there is no need to partition based upon 
the parameters, partitioning here is only time-based. 

(e) params . truthf cn={ ‘ GTFunc_l ’ , ‘ GTFunc_2 1 , ‘ GTFunc_3 ’ } ; 

% Contains names of the functions used to establish the ground 
truth for a specific adverse event that occurs within the sce- 
nario 

i. Example call: 
subevent = 

GTFunc.l (Idx , params , obsval , rawdata.val) ; 

ii. The input arguments are as follows: 

A. Idx - integer index of the type of hie being loaded 
(either validation or test), i.e. File # Idx out of X 
total validation or Y total test hies 
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B. params - a struct containing all relevant fields for the 
specific run configuration. It needs to be passed in as 
the second argument and named exactly as “params.” 

C. obsval - indexed structure containing all validation 
or test files, created by an auxiliary ACCEPT func- 
tion. It needs to be passed in as the second argument 
and named exactly as “obsval.” It contains the un- 
normalized residual formed by the regression stage, 
contained in the field obsval. data. The other two 
fields contain ground truth data, obsval . subevent 
contains the “raw” ground truth derived from the 
logical construction of this function, obsval . event 
contains the ground truth vector that is constructed 
based upon the given prediction horizon. 

D. rawdata_val - cell array containing all validation or 
test files, also created by an auxiliary ACCEPT func- 
tion. It needs to be passed in as the fourth and last 
argument and named exactly as “rawdata_val.” It 
contains the raw parametric data in engineering units, 
prior to normalization. The first argument, Idx, in- 
dexes the rawdata_val cell array. 

iii. The output argument is a binary vector representing the 
ground truth status for each time step (0 - adverse event 
absent, 1 - adverse event present). 

iv. The body of the function must include the logic necessary 
to establish the ground truth for a given validation or test 
data file. 

(f) In the file User_input_ressarch_generic .m, you will find 
only the first section needs to be modified if desired, which 
should be clearly demarcated. Modify the paths accordingly 
and save as User_input_ressarch.m (without the “.generic” 
suffix): 

(g) Use the following format and rules for creating the file 
Adverse_Event_X_Parameterlist . txt: 

i. All parameter names must be encapsulated with double 
quotes as such “Parameter 1” 

ii. Each line of the file should only contain a single parameter 
name, followed by a carriage return. 

iii. The listed parameter name must match the name of a 
parameter in the * .mat data file. 

3. Running ACCEPT 

(a) Matlab session 

i. >> ressarch (runs ACCEPT) 
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ii. >> combine_ressarch (combines results from previous 
ACCEPT runs) 


Appendix D 

ACCEPT developer’s guide 


Steps to add a new regression method in ACCEPT using local re- 
sources: 

1. In the regressopt folder 

• Create new folder for the regression method in the regressopt 
folder, $FOLDERNAME (the convention for naming is all common 
letters). 

• There should be at least two functions in this new regression 
methods folder. They should be named buildTEMP.m and 
evalTEMP.m where you exchange ’TEMP’ for the name of your 
regression method e.g buildELM, evalELM (the convention is 
all capital letters). 

• buildTEMP should take in trainData and runOptions as pa- 
rameters and return’s a variable that stores the model which 
was produced by the regression method. 

• evalTEMP should take in the model (the variable that was 
returned from the buildTEMP function above) and tst as pa- 
rameters and return the predicted value(s). 

2. In mainREGcode_ressarch.m 

• Add the name of the regression method to the end of the 
algomame cell array (the convention is all common letters, 
no spaces). 

• Add a new case block for the new regression method in the 
switch statement. It should follow the pattern of the oth- 
ers changing the names to be the new regression method 
where appropriate. Here is where you would place a hyper- 
parameter to tune (see Step 3) and also where to include the 
buildTEMP and evalTEMP functions. 

3. In aux_input . m 

• Add the name of the new regression method to the algotypes 
cell array, it should be the same name as in the algo_name 
array in Step 2. 
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• Add the name of the hyper-parameter that you want to tune 
into the tuneparamtypes cell array (the convention is all com- 
mon letters WITH spaces as necessary). 

• Add all the other regression’s hyper-parameters at the end 
of the last if statement (if any). You must create an if 
statement that follows the pattern of the previous regression 
methods to add the hyper-parameters. To add the hyper- 
parameters it should be of the form params . temp Jiyper where 
you replace temp_hyper with the actual name of the hyper- 
parameter. 

4. In test_loop_ressarch.m 

• Add the name of the new regression method (same name in 
algomame and algotypes) to the appropriate if or elseif 
statement in the if block if params . regress . optIdx==7 

5. In reg_ranges .m 

• Add a value to the end of the tuneminrange array (this the 
minimum value for the hyper-parameter that you want to 
tune) 

• Add a value to the end of the tunemaxrange array (this the 
maximum value for the hyper-parameter that you want to 
tune) 

• Add a value to the end of the tunevals array (this is a default 
value that ACCEPT would use when necessary) 

To add a new regression method in ACCEPT using a distributed pro- 
cessing configuration, include the following steps: 

1. $C0NFIG. Values. job. FileDependencies = ‘regressopt ’ , 
‘regressopt$FOLDERNAME’ . 

2. Save as * .mat (R2011b or earlier) or * . settings (R2012a or later) 
in $ACCEPT_DIR/ACCEPT/Distributed Processing/PCTconf igs 

Appendix E 

Demo of ACCEPT on a sample problem 


E.l Sample Problem Description 

Sustainability Base is a 50,000 square foot LEED Platinum certified 
building located at NASA Ames Research Center. It seeks to expand the 
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possibilities of building sustainably on Earth by utilizing NASA’s own 
innovations and technology. Sustainability Base generates data from 
many systems in the building via sensors and power ports. The data 
being used as the basis for the example problem in this tutorial is mo- 
tivated by a scenario devoted to anomalous drops in room temperature 
in Sustainability Base, and can be found at the DaSHlink website here 
https : //c3 .nasa. gov/dashl ink/re source s/936/, 

E.2 Tutorial 
Preprocessing 

As stated in Appendix C, ensure that the DATAJDIR and ACCEPTJDIR 
environment variables are set to the directories where you have 
stored the data and ACCEPT code respectively, e.g. 

> setenv( , DATA_DIR’ , ’ /home/username/ACCEPT_data , ) 

> setenv ( ’ ACCEPT_DIR ’ , ’ /home/username/ ACCEPT ’ ) 

Next, create params.txt and place it in the data directory. All 
sensors, including both target and input parameters, need to be 
separated by a carriage return (i.e. the enter/return button on the 
keyboard). The sensor names should not be separated by a space. 

For example: 

DC114T 

DCRCS103 

DCTN240T 

In this introductory example we only use continuous-valued data 
from two randomly selected sensors to predict the target (room 
temperature) sensor. The two parameters are used to predict 
the air temperature outside of a conference room on the second 
floor (sensor label: DCTN240T, Room 227). One of the two pa- 
rameters corresponds to a sensor measuring the CO 2 concentra- 
tion for a large conference room on the first floor (sensor label: 
DCRCS103, Room 103), which is served by a traditional HVAC 
system. The other parameter corresponds to a sensor measuring 
the room temperature of an electrical closet on the first floor (sen- 
sor label: DC114T, Room 114). 

Next, create a name for the ground truth function, e.g. ground_truth.m 
and specify that any temperature value falling below 68.1 deg F for 
the target sensor is anomalous. For example: 

temp.col = strmatchf ’DCTN240T ... 

params . header , ’ exact ’ ) ; 

data = rawdataValjidx} ; 

truth = data( : ,temp_col) < 68.1; 

anomldx = f ind(truth==l , 1) ; "/find the first anomaly 


44 


Lastly, create a name for the load function, e.g. load_f unction, m. 
Each *.mat hie contains a structure called Info. Info contains 
two fields: one “header” fields containing all sensor names stored 
in a cell array called colheaders, and another held called data, 
containing all the sensor data in a matrix format. 

load(f ilelist) ; 
header = Inf o . colheaders () ; 
data = Info.dataO; 
message = true; 

Running ACCEPT 

Ensure that the correct paths are added to MATLAB’s path and 
run the ressarch.m function. 

> ressarch 

A dialog box will appear; select No so that new results can be 
created. 



Figure E10: Loading Previous Results 
Next, select Yes to create a new conhguration hie. 



Figure Ell: New Conhguration File 

When asked to pick an anomaly candidate, select Cold Complaints, 
and when asked to select a target parameter, select DCTN240T. 
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When asked to either use the system’s local resources or distributed 
processing via a cluster, select your system’s local resources for the 
purposes of this tutorial (input 2). 

> 1- Use cluster scheduler 2 - Use local resources : 2 

When asked to perform regression only, select No (2). 

> Use regression only ? 1 - Yes, 2 - No : 2 

When prompted to select the optimization method, select fixed 
point in order to set pre-optimized values. Selecting any other 
option would require a great deal more computational effort, and 
the purpose of this simple example is primarily to get the user up 
and running with an example that will complete in a short period 
of time. 


Pick optimization method for regression 



Figure E12: Optimization Methods 


When prompted to select regression methods, choose ’lin’ and 
’elm 



When asked to select the type of detection methods to use, choose 
both ‘redline” methods, e.g. “Redline- Training” and ’’Redline- 
Validtion.” 

A prompt will appear to allow for entering the optimized values 
for the selected regression methods; for ‘lin’ enter 0.00001 and for 
‘ELM’ enter 481 (refer to section [4] for further explanation of these 
hyperparameters). A series of questions based on the detection 


46 




Figure E14: Detection Methods in ACCEPT 


methods that were selected will then follow. Use the inputs shown 
in Figure E15 to answer these questions. See section [5] for details 
on the context of the first four questions shown in Fig. |E15 The 
remaining two questions can be addressed as follows: 


1. Enter resolution (number of points) for Monte Carlo-based 
integration (smoothness factor): This option is used for ROC 
curve construction based upon methods that rely on a model- 
based approach to generate the relevant ROC curve statis- 
tics/performance metrics (labeled X-Training and discussed 
in Sec. [5jl. A good default setting for this value has been found 
to be 3600, however in general smaller values yield a trade-off 
between less accurate residts and less time for the integration 
to complete. Similarly, higher values yield more accurate re- 
sults and take more time to complete. The purpose of Monte 
Carlo integration has been discussed in Sec. [5[ 

2. Resolution of ROC curve: Construction of an ROC curve us- 
ing “ training data only” means that it does not rely on any 
ground truth labels, and typical computational shortcuts can- 
not be employed. As such, we must find an alternate way to 
create the ROC curve (please see Algorithm^for a procedural 
summary to accompany the subsequent narrative). A heuris- 
tic approach is taken by splitting the domain of P^ G [0, 1] 
or L a 6 [0, L arnax \ into two distinct subdivisions. Indepen- 
dent grids are then established for each subdivision, and false 
alarm/detection probabilities will be computed for each of the 
points in the grid. Pre-specified tolerance criteria associated 
with these probabilities will be used to terminate the process 
before continuing on to create twice as many new subdivisions 
as the last time. The final result will yield an ROC curve 
with sufficient resolution meeting the specific tolerance crite- 
ria. Refer to Algorithm\^for the pseudo-algorithmic procedure 
employed in this process. Clearly, creation of these subdivi- 
sions has exponential complexity, so the user should be cau- 
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tious about using too high of an integer value. We have found, 
that a value of 10 typically provides sufficient resolution. 


Algorithm 1 ROC Curve Construction 
1: function ROC(e, tol ) 

2: e Doroi([e 0 i €-max\) 

3: stop N > 3600 V max \A(Pf a ^\ < .01 \J max | A (P m d) \ < -01 

4: while not (stop) do 

5: if i < tol + 1 then 

6: Dom i+ i Um=l Domm.([em- 1 , ^f] U e m])) 

7: else 

8: Dom,i + 1 <— Um=i ^c»m m ([arg min e -> stop, arg max e -i stop]) 

9: fa getFalseAlarms(Dom,i + i) 

10: fd <— getFalseDetections(Domi + i) 

11: if-i + 1 

12 : fa 4— cleanFalseAlarms(Dom, en d) 

13: fd <— cleanFalseDetections(Dom en d) 

14: fa <— sortFalseAlarms(Dom en d) 

15: fd sortFalseDetections(Dom en d) 


Provide a descriptive name after being prompted to save the con- 
figuration. Now ACCEPT will run and produce the desired results. 

» ressarch 

1 - Use cluster scheduler, 2 - Use local resources : 2 
Use regression only ?: 1 - Yes, 2 - No: 2 

Global Optimization Toolbox not detected, ACCEPT will use either a grid search for optimizing the regression 
Design Alarm System by constraint on 1 - False Alarm Rate, 2 - Missed Detection Rate, 3 - Equal Tradeoff : 3 
Use ASOS approximation for LDS learning ? 1 - Yes, 2 - No : 2 
Maximum state order = : 2 

What is the maximum design (and validation] prediction horizon ? : 30 

Enter resolution (number of points) for Monte Carlo-based integration (smoothness factor) : 3600 

Resolution of ROC curve (bits) : 10 

Figure E15: Inputs for ACCEPT 
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E.3 ACCEPT Results 


The following table shows various runtimes for the most computation- 
ally intensive elements of ACCEPT. The values are in seconds, and are 
approximated to the nearest second. 


Method 
Linear ELM 

Regression 

1.8 

1.1 

LDS Parameters 

90 

206 

Detection 

5.8 

5.9 

Total 

97.6 

213 


Table E4: Time to Run (seconds) 


Learning the linear dynamical system is where most of the compu- 
tation lies for this problem. This is due to the fact that there are 0(n 2 ) 
parameters in 9 to learn that are associated with the LDS learning pro- 
cess. This stands in distinction to the regression and detection parts of 
the pipeline which do not suffer from the same level of computational 
complexity. For more detail on the complexity associated with LDS 
learning, refer to |28| and j30|. 


Results 


Method 


Linear 

ELM 

Regression 

Global Optimum 
Optimized Values 

l.le-05 

0.0000 

481 

0.0001 

Missed Detection 

Redline - Training 
Redline - Validation 

0.0000 

0.0088 

0.0088 

0.0088 

False Alarm 

Redline - Training 
Redline - Validation 

0.0476 

0.0022 

0.0108 

0.0108 

Detection Time 

Redline - Training 
Redline - Validation 

2.0000 

0.0000 

0.0000 

0.0000 


Table E5: Results From ACCEPT 


From Table IE5 


it is clear that the performance of using either linear 
regression or ELM regression is very similar, and the small differences 
are negligible with respect to system performance. As such, Figs. E16 


and Fig. E17 illustrate only the results for linear regression. 

The reproduction of similar results should be possible by following the 
step-by-step instructions provided for this same exercise. Please contact 
the authors if dramatically different results were generated, which may 
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Pja and P ln „ 


A vs. L a 



0.6 • Training P = 0.0020102 at L a = 0.0057216 

Training P (> = 0.0020006 at = 0.0057216 
0.4 



L (Alarm Threshold) x 10 


(a) Redline - Training 


P/„ and P m d vs. L a 



L (Alarm Threshold) 


(c) Redline - Validation 



(b) Redline - Training 



(d) Redline - Validation 


Figure E16: Detection Results for Linear Regression (LR1), Nominal 
Realizations 




(c) Redline - Validation (d) Redline - Validation 


Figure El 7: Detection Results for Linear Regression (LR1), Anomalous 
Realizations 


be possibly be due to inconsistencies between the version of the code 
posted to DaSHlink and the ones used to generated the results shown 
here. Although highly accurate and robust advance prediction capability 
was demonstrated here with “toy” scenario that relies on real data, it 
can also be employed for more realistic scenarios relating to safety and 
even for other types of mission critical systems. Ultimately, ACCEPT 
will not only enable real-time advance prediction of critical events, but 
will also allow for an enhanced ability to choose the best combination 
of algorithms to address the needs of application-specific performance 
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requirements. 
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